Fast Nearest Neighbor Search in High-Dimensional Space

نویسندگان

  • Stefan Berchtold
  • Bernhard Ertl
  • Daniel A. Keim
  • Hans-Peter Kriegel
  • Thomas Seidl
چکیده

Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor search which corresponds to a computation of the voronoi cell of each data point. In a second step, we store the voronoi cells in an index structure efficient for high-dimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e. it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in the X-tree (up to a factor of 4).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search

Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present “HDIdx”, an efficient high-dimensional indexing library for f...

متن کامل

Fast k-NN classification rule using metric on space-filling curves

A fast nearest neighbor algorithm for pattern classiication is proposed and tested on real data. The patterns (points in d-dimensional Euclidean space) are sorted along a space-lling curve. This way the multidi-mensional problem is compressed to the simplest case of the nearest neighbor search in one dimension.

متن کامل

Fast Nearest-Neighbor Search Algorithms Based on High-Multidimensional Data

Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore pre-compute the result of any nearest-neighbor s...

متن کامل

Searching High-Dimensional Neighbours: CPU-Based Tailored Data-Structures Versus GPU-Based Brute-Force Method

Many image processing algorithms rely on nearest neighbor (NN) or on the k nearest neighbor (kNN) search problem. Several methods have been proposed to reduce the computation time, for instance using space partitionning. However, these methods are very slow in high dimensional space. In this paper, we propose a fast implementation of the brute-force algorithm using GPU (Graphics Processing Unit...

متن کامل

The Analysis of a Probabilistic Approach to Nearest Neighbor Searching

Given a set S of n data points in some metric space. Given a query point q in this space, a nearest neighbor query asks for the nearest point of S to q. Throughout we will assume that the space is real d-dimensional space <d, and the metric is Euclidean distance. The goal is to preprocess S into a data structure so that such queries can be answered efficiently. Nearest neighbor searching has ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998